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Abstract — This paper describes a framework in which directed 
information is defined on abstract spaces. The frameworlt is 
employed to derive properties of directed information such as 
convexity, concavity, lower semicontinuity, by using the topology 
of weak convergence of probability measures on Polish spaces. 
Two extremum problems of directed information related to 
capacity of channels with memory and feedback, and causal 
and sequential rate distortion are extensively analyzed showing 
existence of maximizing and minimizing solutions, respectively. 

I. Introduction 
Directed information from a sequence of Random Variables 

(RV's) X" = {Xo,Xi,...,A„} e x^,n = xf^o;k'„ to 

another sequence F" = {Yq.Yi, . . . ^Y^} e 3^o,n = x^'^g}^., 
is often defined via H], 111 



(1) 



{dy^\t-^) 



xPx^,Y^{dx\df) (2) 
= Ix^^^Y'-{Px,\X'-\Y'-^,PYi\Y'-\xi ■■i = 0,.-.,n) (3) 



. Since the joint distribution in Q is decomposed via 



P 



X\Y' 



PyjIy^-^ ,xi{dyj\y-'~^ ,x^), the notation 1x"^Y"{-, ■) denotes 
the functional dependence on two collections of causal con- 
ditional distributions {^XilJf'-i.y'-i ('I'l ") ■ * = 0,1,..., n} 
and {iV,|y-i,X'(-K •) : « = 0, 1, . . . , n}. 
In information theory, directed information ([ril-([3]l or its 
variants are used to characterize capacity of channels with 
memory and feedback |3|, |4|, |5|, lossy data compression 
with feedforward information at the decoder [6|, lossy data 
compression of sequential codes [4 |, lossy data compression of 
causal codes |7 1, and capacity of networks such as the two-way 
channel, multiple access channel JS), ID, etc. The previous 
references derive coding theorems based on a) stationary 

ergodic processes {{Xi,Yi)}'?^Q, b) Dobrushin's stability of 

p 

the information density logig3"^Q^^ 
c) via information spectrum methods 



-1 xi {d-Vily^ .a;') 



and 



definition of channels with memory and feedback is given by 

A 1 



C^'iP) == lim 



sup 



n + 1 



where ^{P) denotes the power constraint set, and 

"Px-lY^-^idx^ldy"-') = ®7=oPx^\x^-^,Y^-^{dx^\x^-\f~') 

Sequential and Causal Rate Distortion. Based a) or b) the 
operational definition of sequential and causal rate distortion 
function is given by expression 

i?"(7:>)=lim inf ^— 

where ~C^{D) is the distortion fidelity constraint and 

7^y„ix4d2/"|x") = ^7=oPY,\Y^-\x^idy^\y'-\^l, 



Px,\x--i,Y--iidxi\x' ) ^ Pxiixi-^idxilx' )-a.s. 

The complete characterization and properties of the above ex- 
tremum problems requires extensive analysis of the functional 
Ix"-s-y"(-, •) as defined in ([3]). This is analogous to capacity 
of channels without feedback which involves maximization 
of mutual information /(X";!"") over the power constraint 
set, and to classical rate distortion function which involves 
minimization of mutual information /(X";!^") over the fi- 
delity constraint. However, mutual information = 
lx";Y"{Px" , Py"\X"), inherits from its information diver- 
gence definition /(X";^") = ©(Px",^" | l^'x" x Fy,.), 
several important functional properties such as convexity, 
concavity, lower semicontinuity, etc. These properties are 
vital both for finite alphabet spaces [.1 IJ , as well as abstract 
alphabet spaces 1121 . |fT3l . The difficulty associated with 
directed information /(X" — F"), rises from the fact that this 
information measure dT]!-© is a functional Ix"->Y" (■, •) of the 
collection of conditional distributions {Px-\x'-^,Y'~^{'\'r) ■ 
i = 0, 1, . . . , n} and {Py.iyi-i x»('K ■) ■ i = 0,1, . . . ,n}. 
The objective of this paper is to address the following ques- 
tions, when Ab.ri and yo,n are complete separable metric 
spaces (Polish spaces). 
1. Is there an equivalent directed information definition ex- 
pressed via information divergence B( || ) as a functional 
of two appropriate conditional distributions P(-|y) on 



Capacity with Feedback. Based on a) or b) the operational 



A 



x-qA", for y = (yo,2/i,...) e y^' 



A 



and Q( |x) on for x e which uniquely define 

{-Px.|x-i,y»-i : i = 0, !,•■•} and {Py,\y^-\X' ■ i = 
0, 1, . . .}, respectively, and vice-versa? 

2. Is directed information convex and concave functional 
with respect to the conditional distributions P( |y) and 
Q(-|x)? 

3. Is directed information a lower semicontinuous functional 
of the conditional distributions P(-|y) and Q(-|x)? 

4. What are appropriate conditions for the abstract spaces 
Xq^u and y^^n on which existence of the maximizing en- 
coder admissible distributions and minimizing distortion 
admissible distribution can be saught? 

Complete answers to the above questions are provided by 
invoking the topology of weak convergence of probability 
measures on Polish spaces and Prohorov's theorems. The 
derivation are outlined since they are quite lengthy. 

II. Causal Channels on Abstract Spaces 

In this section, the aim is to establish two equivalent 
definitions of conditional distributions or basic processes, 
which define any probabilistic channel with causal feedback, 
that relate causally the input-output behavior of any channel. 
This formulation is necessary to investigate questions l-A. 
Let N = {0, 1,2,.. .}, and N" = {0, 1, 2, ... , n}. Introduce 
two sequence of spaces ;B(A'„)) : 7i e N} and 

{{yn,B{yn)) ■■n^n}, where A'„,3;„,n e N are topological 
spaces, and B{Xn) and S(3^„) are Borel cr— algebras of 
subsets of Xn and 3^„, respectively. Points in X^ = yCm^fiXm 
y^^ = XngN^n are denoted by x = {a;o, a^i, • • •} e X^ , 
y = {j/0,2/1, . • .} e y^'\ respectively, while their restrictions 
to finite coordinates by cc" = {xq, xi, . . . , a;„} £ -^o^n, 
2/" = {2/0,2/1, ■ • ■ ,2/n} e yo,«, for n E N. 
Let B{X^) = Q^enBiX,) denote the cr-algebra in X^ 
generated by cylinder sets {x — {xq,xi, . . .) G X^ : xq G 
Ao,xi e Ai,...,Xn e An},A^ e B{X,),0 < i < n,n > 1, 
and similarly for B{y^) = Q^eI^B{y^). 

Hence, B(<Yo.„) and i3(3^o,n) denote the cr— algebras of 
cylinder sets in X^ and 3^^, respectively, with bases over 
Ai e B{Xi), and Bi G B{yi), < i < n, respectively. 

Backward or Feedback Channel. Suppose for each n G 
N, the distributions {p„((ia;„|a;"~^, : n G N} with 

Po{dxo\x^'^ ,y^^) — po{xo) satisfy the following conditions. 

i) For n G N, p„(-|a;"~^, 2/"""'^) is a probability measure on 
B{Xn); 

ii) For n G N, A„ G BiX^), p„(A„|a;"-i, y""!) is 
Q^-Q^ B (X,) Q B{y,) -msasumhle in G Xo,n-i, 2/""^ e 

yo,n-i- 

Given the collection {p„((ia;„ I a;"^^, ?/""^) : n G N} satisfying 
conditions i), ii), one can construct a family of distributions 

on {Xf\B{X^)) = ( x,^t^X„Q,eNB{X,)) as follows. 

Let C G B{Xo,n) be a cylinder set of the form C = {x G A''^ : 
xo G Co,xi G Ci,...,Xn e C„}, G B{X,), <i <n. 



Define a family of measures P(-|y) on B{X^) by 

P(C|y)^/ po{dxo)...f p„(da;„|a;"-\y"-i) (4) 

JCa JC„ 

= ^oACoM'^'), Co,„ = x1^„a (5) 

The notation ^o,ri('|2/"^) is used to denote the restriction of 
the measure P(-|y) on cylinder sets C G B{Xo,n), for n G N. 
Thus, if conditions i) and ii) hold then for each y G y^\ 
the right hand side of (U) defines a consistent family of 
finite-dimensional distribution on {X^ , B{X^)), and hence 
there exists a unique measure on {X^ , B{X^)), from which 
Pn{dxn\x"~^ ,y"~^) is obtained. This leads to the first, usual 
definition of a feedback channel, as a family of functions 
Pn{dxn\x"^^ ,y"^^) satisfying conditions i) and ii). 
An alternative, equivalent definition of a feedback channel is 
established as follows. Introduce the assumption 

iii) {Xn : n G N} are complete separable metric spaces 
(Polish Spaces) and {B{Xn) : n £ N} are the cr— algebras of 
Borel sets. 

Consider a family of measures P(-|y) on {X^'^,B{X^)) 
satisfying the following consistency condition. 
CI: IfEe B{Xo,n), then P(£;|y) is i3(yo,«-i) -measurable 
function of y G y^. 

Then, by assumption iii), for any family of measures P(-|y) 
satisfying CI one can construct a collection of versions of 
conditional distributions {pn{dxn\x"^^ ,y"~^) : n G N} 
satisfying conditions i) and ii) which are connected with 
P(-|y) via relation (|4]). 

Therefore, for Polish Spaces {Xn : n G N} the second 
definition is given by a family of measures P( |y) on 
{X^,B{X^)) depending parametrically on y G 3^^ and 
satisfying the consistency condition CI. 
The point to be made here is that the second equivalent 
definition of a feedback channel, together with similar 
definition for the forward channel is convenient to define 
directed information via relative entropy, similar to the mutual 
information definition, and extend well-known functional 
properties of mutual information to directed information. 

Forward Channel. The previous methodology is repeated for 
the collection of functions {qn{dyn\y^^^ , x^) : n G N} which 
satisfy the following conditions. 

iv) For n G N, (7n(-|y"^^,a;") is a probability measure on 

B{yn)\ 

V) For n G N, B„ G g„(B„|y"-i, a:") is QZo^iy^)(^ 

Z5(A:'i)— measurable function of x" G <Yo.„, G yo,n-i- 
Similarly as before, given a cylinder set D G ;B(yo,n) of the 

form D = |y G y'*' : yoei'o, yiG-Di, . . . , j/nGZ?™!, A e 

B{yi)^ < i < n, define a family of measures on B{y^) by 

Qp|x)=/ qo{dyo\xo) . . . f qn{dynW'~\x^) (6) 
jdq Jd,^ 

= ^o,«(^o,n|a:"), Do,n = xJLoA (7) 

For each x G X^"^ the right hand side of ^ defines a 
consistent family of finite dimensional distribution, hence there 



exists a unique measure on {y^\ B{y^^)) for which the family 
of distributions {qn{dynW^^^ -.x") n G N} is obtained. 
Introduced the assumption 

vi) {y^ : n e N} are Pohsh Spaces and {B{y„) : n e N} 
are the cr— algebras of Borel sets. 

Consider a family of measures Q(£'|x) satisfying the follow- 
ing consistency condition. 

C2: If e B{yo,n), then Q(F|x) is e(A'o,„)-measurable 
function of a: G . 

Then, by assumption vi), for any family of measures Q( |x) 
on (3^^,S(3^^)) satisfying consistency condition C2 one can 
construct a collection of functions a;") : n G 

N} satisfying conditions iv) and v) which are connected with 
Q(-|x) via relation ([6]). 

Given the basic measures P(-|y) on and Q(-|x) on y^ 
satisfying consistency condition CI and C2, respectively, con- 
struct the collections of conditional distributions as follows. 
Let = {x : XnEA}, A e B{X.n) and B^") = {y : 

VneB}, B e B{yn). In addition, let P(A(") |y|S(A'o,„_i)) 
denote the conditional probability of A^"-' with re- 
spect to B{Xf)n_i) calculated on the probability space 
(A:'N,i3(A'N),P(.|y)), and Q(B(")|x|i3(yo,n-i)) denote the 
conditional probability of S'"^ with respect to B{yo,n-i) cal- 
culated on the probability space (y"^, Q(-|x')) . Then 



= P({x : X. 



eA}\y\BiXo^n-i)) = p„(A„;x"-\2/"-i) - a.s. 



Q({y : yneB}\^\B{yo,n-l)) = qn{Bn\V 



a.s. 



Note that p„(-;-,-) G Q(A'„; Ab,™-! x 3^o,n-i) and (7„(-;-,-) £ 
Q(3^ri; 3^o.ri-i X Xo^n) are stochastic kernels |14|, determined 
from P(-|-) and Q(-|-): respectively, (e.g., related via 
The distribution of RV's {(X^, K^) : i e N} is defined by 

P{XoeAo,yo e So,...,x„eyl„,K„eB„} 



Po{dxo) I qo{dyo;xo). 

Ao J Bo 

Hence, for any P(-|-) and Q(-|-) satisfying consistency con- 
ditions there exist a probability space and a sequence of RV's 
{{Xi,Yi) : i G N} defined on it, whose joint probability 
distribution is defined uniquely via P( | ) and Q( | ). 

III. Directed Information Properties and 
Compactness 

In this section, directed information I{X" F") will be 
defined via relative entropy, using the basic measures P( |y) 
and Q( |x), and identify its properties. Define 

QCi^^N.yW^ A ^ MliX'') : P(-|y) are regular 

probability measures and satisfy consistency condition Cl|. 

QC2(-yN.;^N) A ^ Mi{y^') : Q(-|x) are regular 

probability measures and satisfy consistency condition C2|. 



Given conditional distributions P( | ) £ Q*-'-'^(A''^; 3^^) and 
Q(-|-) e Q'^^iy^;X^) define the following measures. 
PI: The joint distribution on X^"^ x 3^'^ defined uniquely by 

(Po.n ® ^„J{x^=oA,xB,),A, e B{X,), B, e Biy,), 

=p{XoeAo,ro e Bo, • • • ,^ne^n,>;^eB„} (8) 

Formally, ([8|) is written as (p^.n ® ~(^o,n)idx",dy"). 
P2: The marginal distributions on X^ defined uniquely by 

A*o,n(xr=o^»), ^^ G ISiX,), l<i<n 

= F{Xo e Ao, Fo e ^0, ■ • ■ , e ^n, K„ e y^}, 

P3 : The marginal distributions on defined uniquely by 

. . . ,X„ e X,„Yn e B„}, B, e B{y^), l<i<n 

= (K«®4,„)(xr=o('^^xso) 

P4: The measure l!o,n : B{Xo,n) S(3^o,n) [0, 1] defined 
uniquely for A, G b(x,), Bi £ B{y^), 1 < i < n by 

P5: The measure 1lo,„ : B{yoA B{XoA [0, 1] defined 
uniquely for A, e B{Xi), Bi G B{y^), 1 < i < n by 

tlo,n(xr=o(^*xB,)) = (^o,« (^o,„)(xr=o(^*xB,)) 

A. Directed Information 

Let P(-|-) e QC1(A'N;3;N) and q(.|.) ^ QC2^p^n.yiy 
By invoking the definition of directed information, it can be 
shown (using the chain rule of relative entropy and the relation 
between absolute continuity of measures) [,14j that directed 
information is equivalently given by 

/(X" ^ r") - B(^o,n "^0,J|r?0,n) (9) 

log ( ^°;"^^f!f^ ) (^o.„ 4, J(rf-", dyn 



Ix"^y"(-Po,n, ^O.n) 



(10) 



The notation Ix^^y (■, ■) indicates the functional dependence 
of /(X" F") on {Po,n,^o,n}- The investigation of 
the functional properties of directed information is done via 
Ix-^y- (•,•)■ 

B. Convexity and Concavity of Directed Information 

Let Q'^^(A'o,„;3^o,n-i), Q^'^{yQ,n] X^A be the restrictions 
of Q'=^^{X^^-y^^) and Q'^'^{y^]X^), respectively, to cylin- 
der sets with bases over Ai G B{Xi), and Bi G B{yi), 
i = 0,1,..., n. These are regular conditional distributions. 

Theorem 3.1: Let {Xn,B{X^) : n G N}, {3^„,B(3^„) : n G 
N} be Polish spaces. Then 

1) Q'^^(A'o.„;3'o,n-i), Q'^'^{yo,n\XoA are convex sets. 



2) Ix'^^Y'^i'pO.n^'^O.n) 



functional of ^g.i 



IS a convex 

Q^^(yo,n,A'o,„) forafixedK,n e Q<^MA'o,„; % „„i). 
3) Ix"->.y" (5^0, Ti, ^0 rt) ^ concave functional of i^o,n G 
S'^^('^o,n;3^o,n-i) forafixed ^o,n e Q^2(3^o,n;^o,n). 
Proof: 1) Follows from the convexity of reg- 
ular conditional distributions, since Q*^''^(Ao.„; 3^o,n-i)^ 
S*^'^(3^o.n; -^o.ri) are subsets satisfying consistency condition 
CI, C2, respectively. 2), 3), follow from log-sum formulae. 



C. Lower semicontinuity-Continuity of Directed Information 

This part discusses the lower-semicontinuit^and continuity 
of directed information as a functional of ^'o.nl lj/"^) G 

Q'''(^0,n;:^'0.n-l) and ^o,n(-k") G Q^^ {y^y, X^.n), with 

respect to the topology of weak convergence of probabil- 
ity measures. Before establishing the main results, sufficient 
conditions for weak compactness of the set of measures 
Q*^^('^o,n; J^o,n-i), Q^^{yo,n] -^o,™), and joint and marginal 
measures are given. 
Theorem 3.2: 

Part A. Let 3^0, n be a compact Polish space and Xq ^ 

a Polish space. Assume Po,n(-|2/""^) £ Q'^^C-^o,™; J^o.n-i) 
satisfy the following condition. 

CA: For all g{-)eBC{Xo^n), where BC{Xo^„) denotes the 
set of bounded continuous real-valued functions on Xq ^, 



(a;"-i,y"-i) 



(11) 



is jointly continuous in (a;" ^,?/" ^) £ .^cn-i x 3^o,n-i- 
Then the following weak convergence results hold. 

Al) Let^ K„(-|y"-') e Q^HX„.^■,yo,n-l) and 

{^o,„(-k")}„>i e Q^'(3^o.„;^o,n). Then the 

joint measure {Po,n ® Qo,n){dx^ ,dy") => 

{PQ,n <E> ~C^on){dx",dy"), where C^onO") ^ 

A2) Let^ Kn(-|y""') G Q^'(^o,n;yo,n-i) and 
{(lo,„(-k")}„>i e Q''^(3^o,«;^o,n) and define the 
family of joint measures {{Po.n^'^o dy")] 
having marginals {i'o,„}a>i on J^o,™ and {fJ-Q,n}a>i 
on ;fo,„. Then ^ ^^g.J^^y") and 

^ /iS],„(dx") where ^.g,,, G Xi(3^o.„) 
and „ G A^i('Vo,n) are the mai^ginals of 

A3) The sets of measures Q*^'''(A:b.n; 3^o.ri-i), and 
Q*^^(3^o,n; '^o.n) are weakly compact. 



A4) Let Po,n(-|y" ) t {^0,n,J^am-lj 

{^o,„(-k")}„>i G S^'(3^o,n;<Yo.„), and {l^S^Ja>l 



are the marginals of {(^o,n ® ^o.rJC*^^"' '^f")} 
Then rf"(da:", drT) 



a>l' 



= Po.n(da;"|dy"-i) 



where G A4i(>'o,n) is the limit of 

Part B. Let X^ ^ be a compact Polish space and 3^0,™ a 
Polish space. Assume ^o,n('k") G Q*^^(3^o,n; -^Cn) satisfy 
the following condition. 
CB: For all /i(-)GBC(>'o,n), the function 



%)g„(dy;2/"-\a;") G 



(12) 



is jointly continuous in (a;", y" ^) G Ao_„ x 3^o,ra-i- 
The statements of Part A hold by interchanging Qq 

by ^^o.n(-|y"^'), 1^0 Ady") by Mo,n(dx"), T!(da;",d2/") by 
n(da;",ciy"). 

Proof: The proof is quite lengthy and it is based on 
Prohorov's theorem relating tightness and weak compactness 
of a family of probability measures |14|. ■ 
The results of Theorem 13.21 are sufficient to establish lower 
semicontinuity of directed information /(X" y") = 



Theorem 3.3: 1) Suppose the^onditions in Theorem 13.21 
Part A hold. Then Ijc^^yn (Po.n, ") is lower semi- 
continuous on , 



G Q^^(3^o,n; <^o,n) for fixed P 



2) Suppose the conditions in Theorem 13. 21 Part B hold. Then 

Ijf^y (^o,ni ^0 n) is lower semicontinuous on ^'o.n G 
Q^'(^o,«;3^o,n-i)'for fixed ^o_„ G Q^^iyo^n] Xo,n)- 

Proof: Utilizes Theorem 13. 2[ and lower semiconti- 
nuity of relative entropy. ■ 
For capacity problems, it is desirable to identify conditions 
so that Ixii^yii(Po,n, (^q n) ^ fuuctiou of Po,n for fixcd 
^o,n is either upper semicontinuous or continuous. Continuity 
of directed information is established by generalizing the 
derivation in ifTSll . 

Theorem 3.4: Suppose the conditions in Theorem 13.21 
Part B hold. Consider a forward channel ^on('l^") G 
Q*^^(3^o,n; '^o.n), and a closed family of feedback channels 
S*^^('^o,«;3^o.n-i) C Q^'^{Xo,n\yo,n-i)- Supposc there 
exists a family of measures vo,n{dy^) on (J^o.ni ^O'o.n)) such 
that ^on('N") ^ i^o.nidy") with Radon-Nikodym deriva- 



Q>i tives 6o,,.(2;",2/") 



A 



(y"). Furthermore, suppose 



the following conditions hold. 

A. The family of Radon-Nikodym derivatives „(a;", y") is 
continuous on Xo,n x 3^o,n, and „ (a;", y") log^po „ (2;", y") 
is uniformly integrable over {S'o,n <8i ^o,n : -Po.n G 
Q^'(A'o,„;3^o,n-i)}. 

B. For a fixed y" G 3^o,n, the Radon-Nikodym derivative 
„ (x", y") is uniformly integrable over Q'-'^A'cn; ^o.n-i)- 

Then, the directed information Ix"^Y"{P o,n,~^ n) as 
a_ functional of {^o.ri,^o,n} G Q'^^iXo,n;yo.n-i) X 
Q'^^iyo.n', Xo,n) is bounded and weakly continuous over 

Q*^"''('^0,n; 3^0,n-l)- 

Proof: The derivation invokes (|9|l and generalizes related 
results in HSl. ■ 



IV. ExTREMUM Problems of Directed Information 

In this section, sufficient conditions are given for the ex- 
istence of the extremum problems C^{P) and R'^{D) (men- 
tioned in introduction). 

A. Existence of Capacity Achieving Distribution 

Consider a communication channel with memory and feed- 
back ^on('N") S Q*^^(3^o.ri; '^o,n) with power constraints 



[ 5o,n(a:",y"-i)(^o,n® An)(rfa:",dy") <P} 

where for any n € N, go.n '■ '^o.n x 3^o,n-i ^ [0, oo] is Borel 
measurable. In the absence of any power constraints the set of 
input conditional distributions is Q*^^('Yo,n; 3^o,n-i)- 
The finite horizon maximization of directed information over 
V{P) or Q*^'^(Ao^„; yo,n-i) (e.g., with or without power 
constraints) is defined by 



f ^ 

0,n ~ 



sup lx"->y"(^0,n, ^1, 

^^o,.(-|y"-')e^(P) 

or C'=^(A'o,„;yo,„-l) 



0,n/ 



(13) 



The next theorem establishes existence of the maximizer 

Theorem 4.1: Suppose that the assumptions of Theo- 
rem |3]2] Part B are satisfied. 

1. The set Q*^"'"(<-tb.n; 3^o,n-i) is compact. 

2. The set V{P) is a closed subset of Q'^^(A'o,„; J^cn-i)- 



3. If in addition the assumptions of Theorem 13.41 are sat- 
isfied (here the assumption on Q*^"'"('-ko.„; 3^o,n-i) is 
satisfied by 1. and 2.) then Cq„ has a maximum in 

Qp^{Xo^n\yi^,n-i) (without Constraints) or in V{P) 
(with power constraints). 

Proof: 1. Utilize the fact that probability measures on 
compact Polish spaces are compact. 2. Utilize the fact that 
closed subset of weakly compact set is compact. 3. Follows 
from Weierstrass theorem. ■ 

B. Existence of Causal Rate Distortion Achieving Distribution 

Consider a reconstruction channel C^q„(-|x") G 
Q'^^{yo,n', ^^0.71) and a fixed source distribution 
fJ'0,n{dx'^) G A^i(Ab,n) define the fidelity constraint 
by' 



(14) 



where D > 0, and for each n E N the distortion function 
do,n '■ -^o.n X yo,n '-^ [0, oo) is Borel measurable. 
The finite horizon minimization of directed information over 
~(^{D) is defined by 



RlniD) ^ inf 

Qo,„(-k")€ 



The next theorem establishes existence of the minimizer. 

Theorem 4.2: Suppose that the assumptions of Theo- 
rem |32] Part A are satisfied. 

1. The set Qp'^{yo.n] Xi^^n) is compact. 

2. The set '^{D) is a closed subset of Q*^^(3^o.n; '^o,n)- 

3. i?J5 has a minimum in ~(^{D). 

Proof: The proof is based on generalizing the derivation 
in ifTSl to n-fold convolution measures. It is done by induction. 

■ 

V. Conclusion 

In this paper we have provided a general framework through 
which the properties of mutual information are extended 
to directed information on Polish spaces. The existence of 
solutions to capacity problems with memory and feedback, 
and to lossy causal data compression problems is shown. 
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